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Compressed sensing is a novel research area, which was introduced in 2006, and since 
then has already become a key concept in various areas of applied mathematics, com- 
puter science, and electrical engineering. It surprisingly predicts that high-dimensional 
signals, which allow a sparse representation by a suitable basis or, more generally, a 
frame, can be recovered from what was previously considered highly incomplete linear 
measurements by using efficient algorithms. This article shall serve as an introduction 

L_| to and a survey about compressed sensing. 
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1 Introduction 

a 

The area of compressed sensing was initiated in 2006 by two groundbreaking papers, namely 
[18J by Donoho and [11] by Candes, Romberg, and Tao. Nowadays, after only 6 years, an 
abundance of theoretical aspects of compressed sensing are explored in more than 1000 
articles. Moreover, this methodology is to date extensively utilized by applied mathemati- 
cians, computer scientists, and engineers for a variety of applications in astronomy, biology, 
medicine, radar, and seismology, to name a few. 

The key idea of compressed sensing is to recover a sparse signal from very few non- 
adaptive, linear measurements by convex optimization. Taking a different viewpoint, it 
concerns the exact recovery of a high- dimensional sparse vector after a dimension reduction 
step. From a yet another standpoint, we can regard the problem as computing a sparse 
coefficient vector for a signal with respect to an overcomplete system. The theoretical foun- 
dation of compressed sensing has links with and also explores methodologies from various 



other fields such as, for example, applied harmonic analysis, frame theory, geometric func- 
tional analysis, numerical linear algebra, optimization theory, and random matrix theory. 

It is interesting to notice that this development - the problem of sparse recovery - can in 
fact be traced back to earlier papers from the 90s such as [23] and later the prominent papers 
by Donoho and Huo [21 J and Donoho and Elad |19| . When the previously mentioned two 
fundamental papers introducing compressed sensing were published, the term 'compressed 
sensing' was initially utilized for random sensing matrices, since those allow for a minimal 
number of non-adaptive, linear measurements. Nowadays, the terminology 'compressed 
sensing' is more and more often used interchangeably with 'sparse recovery' in general, 
which is a viewpoint we will also take in this survey paper. 

1.1 The Compressed Sensing Problem 

To state the problem mathematically precisely, let now x = (xj)™ =1 £ W 1 be our signal of 
interest. As prior information, we either assume that x itself is sparse, i.e., it has very few 
non-zero coefficients in the sense that 

IMIo := #{i ■ Xi / 0} 

is small, or that there exists an orthonormal basis or a framqj <J> such that x = $c with c 
being sparse. For this, we let $ be the matrix with the elements of the orthonormal basis 
or the frame as column vectors. In fact, a frame typically provides more flexibility than an 
orthonormal basis due to its redundancy and hence leads to improved sparsifying properties, 
hence in this setting customarily frames are more often employed than orthonormal bases. 
Sometimes the notion of sparsity is weakened, which we for now - before we will make this 
precise in Section [2] - will refer to as approximately sparse. Further, let A be an m x n 
matrix, which is typically called sensing matrix or measurement matrix. Throughout we 
will always assume that m < n and that A does not possess any zero columns, even if not 
explicitly mentioned. 

Then the Compressed Sensing Problem can be formulated as follows: Recover x from 
knowledge of 

V = Ax, 

or recover c from knowledge of 

y = A&c. 

In both cases, we face an underdetermined linear system of equations with sparsity as prior 
information about the vector to be recovered - we do not however know the support, since 
then the solution could be trivially obtained. 
This leads us to the following questions: 

• What are suitable signal and sparsity models? 

• How, when, and with how much accuracy can the signal be algorithmically recovered? 

• What are suitable sensing matrices? 



1 Recall that a frame for a Hilbert space rl is a system (tfi)i e i in H, for which there exist frame bounds 
< A < B < oo such that ^4||:r||| < J2iei \( x > P»)l 2 - B Wa for a11 x e H. A tight frame allows A = B. If 
A = B — 1 can be chosen, (ifi)i^i forms a Parseval frame. For further information, we refer to [12] , 



In this section, we will discuss these questions briefly to build up intuition for the subsequent 
sections. 



1.2 Sparsity: A Reasonable Assumption? 

As a first consideration, one might question whether sparsity is indeed a reasonable assump- 
tion. Due to the complexity of real data certainly only a heuristic answer is possible. 

If a natural image is taken, it is well known that wavelets typically provide sparse 
approximations. This is illustrated in Figure [II which shows a wavelet decomposition |50j 
of an exemplary image. It can clearly be seen that most coefficients are small in absolute 
value, indicated by a darker color. 
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Figure 1: (a) Mathematics building of TU Berlin (Photo by TU-Pressestelle) ; (b) Wavelet 
decomposition 

Depending on the signal, a variety of representation systems which can be used to 
provide sparse approximations is available and is constantly expanded. In fact, it was 
recently shown that wavelet systems do not provide optimally sparse approximations in 
a regularity setting which appears to be suitable for most natural images, but the novel 
system of shearlets does [371116]. Hence, assuming some prior knowledge of the signal to be 
sensed or compressed, typically suitable, well-analyzed representation systems are already 
at hand. If this is not the case, more data sensitive methods such as dictionary learning 
algorithms (see, for instance, [2]), in which a suitable representation system is computed 
for a given set of test signals, are available. 

Depending on the application at hand, often x is already sparse itself. Think, for 
instance, of digital communication, when a cell phone network with n antennas and m 
users needs to be modelled. Or consider genomics, when in a test study m genes shall be 
analyzed with n patients taking part in the study. In the first scenario, very few of the 
users have an ongoing call at a specific time; in the second scenario, very few of the genes 
are actually active. Thus, x being sparse itself is also a very natural assumption. 

In the compressed sensing literature, most results indeed assume that x itself is sparse, 
and the problem y = Ax is considered. Very few articles study the problem of incorporating 
a sparsifying orthonormal basis or frame; we mention specifically |614 E] . In this paper, we 
will also assume throughout that x is already a sparse vector. It should be emphasized that 
'exact' sparsity is often too restricting or unnatural, and weakened sparsity notions need 



to be taken into account. On the other hand, sometimes - such as with the tree structure 
of wavelet coefficients - some structural information on the non-zero coefficients is known, 
which leads to diverse structured sparsity models. Section [2] provides an overview of such 
models. 



1.3 Recovery Algorithms: Optimization Theory and More 

Let x now be a sparse vector. It is quite intuitive to recover x from knowledge of y by 
solving 

(Pq) min [Ixllo subject to y = Ax. 

X 

Due to the unavoidable combinatorial search, this algorithm is however NP-hard |53j . The 
main idea of Chen, Donoho, and Saunders in the fundamental paper p3] was to substitute 
the £q 'norm' by the closest convex norm, which is the l\ norm. This leads to the following 
minimization problem, which they coined Basis Pursuit: 



(Pi) 



min [|x||i subject to y = Ax. 



Due to the shape of the £\ ball, £\ minimization indeed promotes sparsity. For an illustration 
of this fact, we refer the reader to Figure [2j in which l\ minimization is compared to £2 
minimization. We would also like to draw the reader's attention to the small numerical 
example in Figure [3j in which a partial Fourier matrix is chosen as measurement matrix. 



min ||a;||i s.t. y = Ax 




s.t. y = Ax 



{x : y = Ax} 



Figure 2: t\ minimization versus £2 minimization 

The general question of when l £q = £\ holds is key to compressed sensing. Both nec- 
essary and sufficient conditions have been provided, which not only depend on the sparsity 
of the original vector x, but also on the incoherence of the sensing matrix A, which will be 
made precise in Section [3) 

Since for very large data sets £% minimization is often not feasible even when the solvers 
are adapted to the particular structure of compressed sensing problems, various other types 
of recovery algorithms were suggested. These can be roughly separated into convex opti- 
mization, greedy, and combinatorial algorithms (cf . Section p)J , each one having its own 
advantages and disadvantages. 




(b) 




(c) 

Figure 3: (a) Original signal / with random sample points (indicated by circles); (b) The 
Fourier transform /; (c) Perfect recovery of / by i\ minimization; (d) Recovery of / by £2 
minimization 



1.4 Sensing Matrices: How Much Freedom is Allowed? 

As already mentioned, sensing matrices are required to satisfy certain incoherence conditions 
such as, for instance, a small so-called mutual coherence. If we are allowed to choose the 
sensing matrix freely, the best choice are random matrices such as Gaussian iid matrices, 
uniform random ortho-projectors, or Bernoulli matrices, see for instance |llj . 

It is still an open question (cf. SectionElfor more details) whether deterministic matrices 
can be carefully constructed to have similar properties with respect to compressed sensing 
problems. At the moment, different approaches towards this problem are being taken such 
as structured random matrices by, for instance, Rauhut et al. in |58j or |60| . Moreover, most 
applications do not allow for a free choice of the sensing matrix and enforce a particularly 
structured matrix. Exemplary situations are the application of data separation, in which 
the sensing matrix has to consist of two or more orthonormal bases or frames \32\ Chapter 
11], or high resolution radar, for which the sensing matrix has to bear a particular time- 
frequency structure [38] . 



1.5 Compressed Sensing: Quo Vadis? 

At present, a comprehensive core theory seems established except for some few deep ques- 
tions such as the construction of deterministic sensing matrices exhibiting properties similar 
to random matrices. 

One current main direction of research which can be identified with already various 
existing results is the incorporation of additional sparsity properties typically coined struc- 
tured sparsity, see Section [2] for references. Another main direction is the extension or 



transfer of the Compressed Sensing Problem to other settings such as matrix completion, 
see for instance [10] . Moreover, we are currently witnessing the diffusion of compressed sens- 
ing ideas to various application areas such as radar analysis, medical imaging, distributed 
signal processing, and data quantization, to name a few; see [32] for an overview. These 
applications pose intriguing challenges to the area due to the constraints they require, which 
in turn initiates novel theoretical problems. Finally, we observe that due to the need of, in 
particular, fast sparse recovery algorithms, there is a trend to more closely cooperate with 
mathematicians from other research areas, for example from optimization theory, numerical 
linear algebra, or random matrix theory. 

As three examples of recently initiated research directions, we would like to mention 
the following. First, while the theory of compressed sensing focusses on digital data, it is 
desirable to develop a similar theory for the continuum setting. Two promising approaches 
were so far suggested by Eldar et al. (cf. [52]) and Adcock et al. (cf. [1]). Second, in 
contrast to Basis Pursuit, which minimizes the l\ norm of the synthesis coefficients, several 
approaches such as recovery of missing data minimize the t\ norm of the analysis coefficients 



as opposed to minimizing the l\ norm of the synthesis coefficients -, see Subsections 6.1.2 



and 6.2.2 The relation between these two minimization problems is far from being clear, 
and the recently introduced notion of co-sparsity [53] is an interesting approach to shed light 
onto this problem. Third, the utilization of frames as a sparsifying system in the context 
of compressed sensing has become a topic of increased interest, and we refer to the initial 
paper [9]. 



The reader might also want to consult the extensive webpage dsp. rice.edu/cs con- 
taining most published papers in the area of compressed sensing subdivided into different 
topics. We would also like to draw the reader's attention to the recent books [29J and [32 
as well as the survey article [7]. 

1.6 Outline 

In Section [2j we start by discussing different sparsity models including structured sparsity 
and sparsifying dictionaries. The next section, Section [3j is concerned with presenting both 
necessary and sufficient conditions for exact recovery using l\ minimization as a recovery 
strategy. The delicateness of designing sensing matrices is the focus of Section |4j In Section 
[5j other algorithmic approaches to sparse recovery are presented. Finally, applications such 
as data separation are discussed in Section [6j 

2 Signal Models 

Sparsity is the prior information assumed of the vector we intend to efficiently sense or 
whose dimension we intend to reduce, depending on which viewpoint we take. We will start 
by recalling some classical notions of sparsity. Since applications typically impose a certain 
structure on the significant coefficients, various structured sparsity models were introduced 
which we will subsequently present. Finally, we will discuss how to ensure sparsity through 
an appropriate orthonormal basis or frame. 



2.1 Sparsity 

The most basic notion of sparsity states that a vector has at most k non-zero coefficients. 
This is measured by the £q 'norm', which for simplicity we will throughout refer to as a 
norm although it is well-known that || • ||o does not constitute a mathematical norm. 

Definition 2.1 A vector x = (xi)f =1 £ W 1 is called A:-sparse, if 

IMIo = #{i ■ Xi j=- 0} < k. 

The set of all k-sparse vectors is denoted by S^. 

We wish to emphasize that Xj, is a highly non- linear set. Letting x G M. n be a A;-sparse 
signal, it belongs to the linear subspace consisting of all vectors with the same support set. 
Hence the set £& is the union of all subspaces of vectors with support A satisfying | A| < k. 

From an application point of view, the situation of fe-sparse vectors is however unrealis- 
tic, wherefore various weaker versions were suggested. In the following definition we present 
one possibility but do by no means claim this to be the most appropriate one. It might 
though be very natural, since it analyzes the decay rate of the £ p error of the best A:-term 
approximation of a vector. 

Definition 2.2 Let 1 < p < oo and r > 0. A vector x = (x«)f =1 £ W 1 is called p- 
compressible with constant C and rate r, if 

o~k{x)p '■= pi 11 \\x — x\\ p < C ■ k~ r for any k € {1, . . . ,n}. 

seSfc 

2.2 Structured Sparsity 

Typically, the non-zero or significant coefficients do not arise in arbitrary patterns but are 
rather highly structured. Think of the coefficients of a wavelet decomposition which exhibit 
a tree structure, see also Figure [TJ To take these considerations into account, structured 
sparsity models were introduced. A first idea might be to identify the clustered set of 
significant coefficients |22j . An application of this notion will be discussed in Section k3J 

In the following definition as well as in the sequel, for some vector x = (xi)™ =1 S M. n and 
some subset A C {I, . . . , n}, the expression Iax will denote the vector in W 1 defined by 

, . / Xi : i € A, . 

(lAx) i = | Q . HK z = l,...,n. 

Moreover, A c will denote the complement of the set A in {1, . . . , n}. 

Definition 2.3 Let A C {I, . . . ,n} and S > 0. A vector x = (xi)f =1 G M. n is then called 
(5-relatively sparse with respect to A, if 

||1a c ^||i < S. 



The notion of fc-sparsity can also be regarded from a more general viewpoint, which 
simultaneously imposes additional structure. Let x £ W 1 be a /c-sparse signal. Then it 
belongs to the union of linear one-dimensional subspaces consisting of all vectors with 
exactly one non-zero entry; this entry belonging to the support set of x. The number of 
such subspaces equals k. Thus, a natural extension of this concept is the following definition, 
initially introduced in 



Definition 2.4 Let (yVj)^ =1 be a family of subspaces in MJ 1 . Then a vector x E W 1 is 

/c-sparse in the union of subspaces U 7 =i Wj, if there exists A C {1, . . . , N}, |A| < k, such 
that 

x £ (J Wj. 

jeA 

At about the same time, the notion of fusion frame sparsity was introduced in [6]. 
Fusion frames are a set of subspaces having frame- like properties, thereby allowing for 
stability considerations. A family of subspaces (Wj)jL 1 in W 1 is a fusion frame with bounds 
A and B, if 

N 



A \\ x \\2 < Yl ll-fWjO'Oll! < B\\x\\l for all x £ 



where Pyj. denotes the orthogonal projection onto the subspace Wj, see also [13] and [T2] 
Chapter 13]. Fusion frame theory extends classical frame theory by allowing the anal- 
ysis of signals through projections onto arbitrary dimensional subspaces as opposed to 
one-dimensional subspaces in frame theory, hence serving also as a model for distributed 
processing, cf. |62j . Fusion frame sparsity can then be defined in a similar way as for a 
union of subspaces. 

Applications such as manifold learning assume that the signal under consideration lives 
on a general manifold, thereby forcing us to leave the world of linear subspaces. In such 
cases, the signal class is often modeled as a non- linear fc-dimensional manifold Ai in W 1 , 
i.e., 

x eM = {f{6) : 9 G 6} 

with being a fc-dimensional parameter space. Such signals are then considered k- sparse 
in the manifold model, see [65]. For a survey chapter about this topic, the interested reader 
is referred to [32j Chapter 7]. 

We wish to finally mention that applications such as matrix completion require gener- 
alizations of vector sparsity by considering, for instance, low-rank matrix models. This is 
however beyond the scope of this survey paper, and we refer to [32] for more details. 

2.3 Sparsifying Dictionaries and Dictionary Learning 

If the vector itself does not exhibit sparsity, we are required to sparsify it by choosing an 
appropriate representation system - in this field typically coined dictionary. This problem 
can be attacked in two ways, either non-adaptively or adaptively. 

If certain characteristics of the signal are known, a dictionary can be chosen from the 
vast class of already very well explored representation systems such as the Fourier basis, 



wavelets, or shearlets, to name a few. The achieved sparsity might not be optimal, but 
various mathematical properties of these systems are known and fast associated transforms 
are available. 

Improved sparsity can be achieved by choosing the dictionary adaptive to the signals 
at hand. For this, a test set of signals is required, based on which a dictionary is learnt. 
This process is customarily termed dictionary learning. The most well-known and widely 
used algorithm is the K-SVD algorithm introduced by Aharon, Elad, and Bruckstein in 
[2]. However, from a mathematician's point of view, this approach bears two problems 
which will hopefully be both solved in the near future. First, almost no convergence results 
for such algorithms are known. And, second, the learnt dictionaries do not exhibit any 
mathematically exploitable structure, which makes not only an analysis very hard but also 
prevents the design of fast associated transforms. 

3 Conditions for Sparse Recovery 

After having introduced various sparsity notions, in this sense signal models, we next con- 
sider which conditions we need to impose on the sparsity of the original vector and on the 
sensing matrix for exact recovery. For the sparse recovery method, we will focus on l\ min- 
imization similar to most published results and refer to Section [5] for further algorithmic 
approaches. In the sequel of the present section, several incoherence conditions for sensing 
matrices will be introduced. Section [4] then discusses examples of matrices fulfilling those. 
Finally, we mention that most results can be slightly modified to incorporate measurements 
affected by additive noise, i.e., if y = Ax + v with ||z/||2 < e. 

3.1 Uniqueness Conditions for Minimization Problems 

We start by presenting conditions for uniqueness of the solutions to the minimization prob- 



lems (Pq) and (Pi) which we introduced in Subsection 1.3 



3.1.1 Uniqueness of (Pq) 

The correct condition on the sensing matrix is phrased in terms of the so-called spark, whose 
definition we first recall. This notion was introduced in |19j and verbally fuses the notions 
of 'sparse' and 'rank'. 

Definition 3.1 Let A be an m x n matrix. Then the spark of A denoted by spark(A) is 
the minimal number of linearly dependent columns of A. 

It is useful to reformulate this notion in terms of the null space of A, which we will 
throughout denote by M(A), and state its range. The proof is obvious. For the definition 
of Sfe, we refer to Definition |2.1| 



Lemma 3.1 Let A be an m x n matrix. Then 

spark(,4) = min{A; : JV(A) n S fc + {0}} 
and spark(^4) £ [2, m + 1]. 

9 



This notion enables us to derive an equivalent condition on unique solvability of (Po). 
Since the proof is short, we state it for clarity purposes. 

Theorem 3.1 ([19J) Let A be an m x n matrix, and let k 6 N. Then the following condi- 
tions are equivalent. 

(i) If a solution x of (Pq) satisfies \\x\\o < k, then this is the unique solution. 

(ii) k < spark(^)/2. 



Proof, (i) =^> (ii). We argue by contradiction. If (ii) does not hold, by Lemma 3.1 there 
exists some h G M(A), h ^ such that ||/i||o < 2fc. Thus, there exist x and x satisfying 
h = x — x and ||x||o, ||x||o < k, but Ax = Ax, a contradiction to (i). 

(ii) =4> (i). Let x and x satisfy y = Ax = Ax and ||x||o, ||x||o < k. Thus x — x G N(A) 



and \\x — x\\o <2k< spark(^4). By Lemma 3.1 it follows that x — x = 0, which implies (i). 

□ 

3.1.2 Uniqueness of (Pi) 

Due to the underdeterminedness of A and hence the ill-posedness of the recovery problem, 
in the analysis of uniqueness of the minimization problem (Pi), the null space of A also 
plays a particular role. The related so-called null space property, first introduced in [15], is 
defined as follows. 

Definition 3.2 Let A be an m x n matrix. Then A has the null space property (NSP) of 
order k, if, for all h £ N(A) \ {0} and for all index sets | A| < k, 

\\Uh\\i < A||/i||i. 

An equivalent condition for the existence of a unique sparse solution of (Pi) can now be 
stated in terms of the null space property. For the proof, we refer to [15] . 



Theorem 3.2 ( [15J ) Let A be an m x n matrix, and let k 6 N. Then the following condi- 
tions are equivalent. 

(i) If a solution x of (P\) satisfies \\x\\q < k, then this is the unique solution. 

(ii) A satisfies the null space property of order k. 

It should be emphasized that [15] studies the Compressed Sensing Problem in a much 
more general way by analyzing quite general encoding-decoding strategies. 

3.2 Sufficient Conditions 

The core of compressed sensing is to determine when 1 £q = £i, i.e., when the solutions of 
(Po) and (Pi) coincide. The most well-known sufficient conditions for this to hold true are 
phrased in terms of mutual coherence and of the restricted isometry property. 



10 



3.2.1 Mutual Coherence 

The mutual coherence of a matrix, initially introduced in |21| , measures the smallest angle 
between each pair of its columns. 

Definition 3.3 Let A = (ai)™ =1 be an m x n matrix. Then its mutual coherence fJ.(A) is 
defined as 

^3 1 1 0.2 1 1 2 1 1 0-j 1 1 2 

The maximal mutual coherence of a matrix certainly equals 1 in the case when two 
columns are linearly dependent. The lower bound presented in the next result, also known 
as the Welch bound, is more interesting. It can be shown that it is attained by so-called 
optimal Grassmannian frames [63], see also Section El 

Lemma 3.2 Let A be an m x n matrix. Then we have 



n(A)e 



n — m 



m(n — 1) 



,1 



Let us mention that different variants of mutual coherence exist, in particular, the Babel 
function [19j, the cumulative coherence function [64|, the structured p-Babel function [3], 
the fusion coherence [6j , and cluster coherence [22] . The notion of cluster coherence will in 
fact be later discussed in Section [6] for a particular application. 

Imposing a bound on the sparsity of the original vector by the mutual coherence of the 
sensing matrix, the following result can be shown; its proof can be found in [19] . 

Theorem 3.3 ([30, 19] ) Let A be an m x n matrix, and let x G R n \ {0} be a solution of 
(P ) satisfying 

\\x\\ <\{l + n(A)- 1 ). 

Then x is the unique solution of (Pq) and (P\). 

3.2.2 Restricted Isometry Property 

We next discuss the restricted isometry property, initially introduced in [llj. It measures 
the degree to which each submatrix consisting of k column vectors of A is close to being an 
isometry. Notice that this notion automatically ensures stability, as will become evident in 
the next theorem. 

Definition 3.4 Let A be an m x n matrix. Then A has the Restricted Isometry Property 
(RIP) of order k, if there exists a 5k £ (0, 1) such that 

(1 - <5 fe )|H|! < \\Ax\\l < (1 + £ft)IMIi for all x G S fc . 

Several variations of this notion were introduced during the last years, of which examples 
are the fusion RIP [6j and the D-RIP [9]. 

Although also for mutual coherence, error estimates for recovery from noisy data are 
known, in the setting of the RIP those are very natural. In fact, the error can be phrased 



in terms of the best /c-term approximation (cf. Definition 2.2) as follows. 

11 



Theorem 3.4 ([HI I15j ) Let A be anmxn matrix which satisfies the RIP of order 2k with 
°~2k < V% ~ 1- Let x E M n , and let x be a solution of the associated (P\) problem. Then 

X — X\\2 < C • 



/or some constant C dependent on 62k- 

The best known RIP condition for sparse recovery by (Pi) states that (Pi) recovers all 
fc-sparse vectors provided the measurement matrix A satisfies 82k < 0.473, see 



3.3 Necessary Conditions 

Meaningful necessary conditions for '£0 = i\ hi the sense of (Pq) = (Pi) are significantly 
harder to achieve. An interesting string of research was initiated by Donoho and Tanner 
with the two papers [251 [26] . The main idea is to derive equivalent conditions utilizing the 
theory of convex polytopes. For this, let C n be defined by 

C n = {xeR n : \\x\\i < 1}. (1) 

A condition equivalent to '£0 = &i can then be formulated in terms of properties of a 
particular related polytope. For the relevant notions from polytope theory, we refer to |37| . 

Theorem 3.5 ([25, 26J) Let C n be defined as in (fTl), let A be an m x n matrix, and let 
the polytope P be defined by P = AC n C M m . Then the following conditions are equivalent. 

(i) The number of k- faces of P equals the number of k- faces of C n . 

(ii) (P ) = (P 1 ). 

The geometric intuition behind this result is the fact that the number of A;-faces of P 
equals the number of indexing sets A C {1, . . . , n} with |A| = k such that vectors x satisfying 
suppx = A can be recovered via (Pi). 

Extending these techniques, Donoho and Tanner were also able to provide highly ac- 
curate analytical descriptions of the occurring phase transition when considering the area 
of exact recovery dependent on the ratio of the number of equations to the number of un- 
knowns n/m versus the ratio of the number of nonzeros to the number of equations k/n. 
The interested reader is referred to [27] for further details. 

4 Sensing Matrices 

Ideally, we aim for a matrix which has high spark, low mutual coherence, and a small 
RIP constant. As our discussion in this section will show, these properties are often quite 
difficult to achieve, and even computing, for instance, the RIP constant is computationally 
intractable in general (see [59]). 

In the sequel, after presenting some general relations between the introduced notions of 
spark, NSP, mutual coherence, and RIP, we will discuss some explicit constructions for, in 
particular, mutual coherence and RIP. 

12 



4.1 Relations between Spark, NSP, Mutual Coherence, and RIP 

Before discussing different approaches to construct a sensing matrix, we first present several 
known relations between the introduced notions spark, NSP, mutual coherence, and RIP. 
This allows to easily compute or at least estimate other measures, if a sensing matrix is 
designed for a particular measure. For the proofs of the different statements, we refer to 
Chapter 1]. 



Lemma 4.1 Let A be an m x n matrix with normalized columns. 

(i) We have 

spark(A)> 1 + ^t. 

(ii) A satisfies the RIP of order k with 5 k = kfx(A) for all k < fj.(A)^ 1 . 
(Hi) Suppose A satisfies the RIP of order 2k with 62k < v2 — 1. If 

V25 2k Pk 

1-{1 + V2)5 2k < Vn' 

then A satisfies the NSP of order 2k. 

4.2 Spark and Mutual Coherence 

Let us now provide some exemplary classes of sensing matrices with advantageous spark 
and mutual coherence properties. 

The first observation one can make (see also [15] ) is that an m x n Vandermonde matrix 
A satisfies 

spark(A) = m + 1. 

One serious drawback though is the fact that these matrices become badly conditioned as 
-> 00. 
Turning to the weaker notion of mutual coherence, of particular interest - compare 



Subsection 6.1 - are sensing matrices composed of two orthonormal bases or frames for W 71 . 
If the two orthonormal bases $1 and $2, say, are chosen to be mutually unbiased such as 
the Fourier and the Dirac basis (the standard basis), then 

1 

/i([$l|$ 2 ]) = 



m 

which is the optimal bound on mutual coherence for such types ofrax 2m sensing matrix. 
Other constructions are known for m x m? matrices A generated from the Alltop sequence 
|38j or by using Grassmannian frames [63], in which cases the optimal lower bound is 
attained: 



m 



The number of measurements required for recovery of a fc-sparse signal can then be deter- 
mined to be m = 0(k 2 log n). 
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4.3 RIP 

We begin by discussing some deterministic constructions of matrices satisfying the RIP. 
The first noteworthy construction was presented by DeVore and requires m> k 2 , see |17j . 
A very recent, highly sophisticated approach [5] by Bourgain et al. still requires m > k 2 ~ a 
with some small constant a. Hence up to now deterministic constructions require a large 
m, which is typically not feasible for applications, since it scales quadratically in k. 

The construction of random sensing matrices satisfying RIP is a possibility to circumvent 
this problem. Such constructions are closely linked to the famous Johnson-Lindenstrauss 
Lemma, which is extensively utilized in numerical linear algebra, machine learning, and 
other areas requiring dimension reduction. 

Theorem 4.1 (Johnson-Lindenstrauss Lemma |41j) Let e G (0,1), let x±,...,x p G 
M. n , and let m = 0(e~ 2 logp) be a positive integer. Then there exists a Lipschitz map 
f : R n -> R m such that 

(l-e)||a;i-ajj||! < \\f(xi) - f(xj)\\l < (1 + e)\\xi - XjW 2 , for alii, j €{l,...,p}. 

The key requirement for a matrix to satisfy the Johnson-Lindenstrauss Lemma with 
high probability is the following concentration inequality for an arbitrarily fixed x £ M. n : 

P((l " e)\\x\\l < \\Ax\\ 2 < (1 + e)\\x\\ 2 ) < 1 - 2e~ c ^ m , (2) 

with the entries of A being generated by a certain probability distribution. The relation of 
RIP to the Johnson-Lindenstrauss Lemma is established in the following result. We also 
mention that recently even a converse of the following theorem was proved in [43J . 



Theorem 4.2 (|3j) Let 5 6 (0, 1). If the probability distribution generating the m x n 
matrices A satisfies the concentration inequality ^ with e = 5, then there exist constants 
ci,C2 such that, with probability < 1 — 2e~ C2 m , A satisfies the RIP of order k with 5 for 
all k < ci5 2 m/log(n/k). 

This observation was then used in [3] to prove that Gaussian and Bernoulli random 
matrices satisfy the RIP of order k with 5 provided that m > 5~ 2 k\og(n/k). Up to a 
constant, lower bounds for Gelfand widths of ^i-balls |35j show that this dependence on k 
and n is indeed optimal. 

5 Recovery Algorithms 

In this section, we will provide a brief overview of the different types of algorithms typically 
used for sparse recovery. Convex optimization algorithms require very few measurements 
but are computationally more complex. On the other extreme are combinatorial algorithms, 
which are very fast - often sublinear - but require many measurements that are sometimes 
difficult to obtain. Greedy algorithms are in some sense a good compromise between those 
extremes concerning computational complexity and the required number of measurements. 
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5.1 Convex Optimization 

In Subsection |1.3[ we already stated the convex optimization problem 

min ||x||i subject to y = Ax 

X 

most commonly used. If the measurements are affected by noise, a conic constraint is 
required; i.e., the minimization problem needs to be changed to 

min||x||i subject to \\Ax — y\\ 2 < £, 

X 

for a carefully chosen e > 0. For a particular regularization parameter A > 0, this problem 
is equivalent to the unconstrained version given by 

min h\\Ax — y\\ 2 + A||x||i. 

X 

Developed convex optimization algorithms specifically adapted to the compressed sens- 
ing setting include interior-point methods [11], projected gradient methods [33], and it- 
erative thresholding [16J. The reader might also be interested to check the webpages 
www-stat.stanford.edu/~candes/llmagic and sparselab.stanford.edu for available 
code. It is worth pointing out that the intense research performed in this area has slightly 
diminished the computational disadvantage of convex optimization algorithms for com- 
pressed sensing as compared to greedy type algorithms. 

5.2 Greedy Algorithms 

Greedy algorithms iteratively approximate the coefficients and the support of the original 
signal. They have the advantage of being very fast and easy to implement. Often the 
theoretical performance guarantees are very similar to, for instance, i\ minimization results. 
The most well-known greedy approach is Orthogonal Matching Pursuit, which is de- 
scribed in Figure Hi OMP was introduced in [57] as an improved successor of Matching 
Pursuit 1511. 



Interestingly, a theorem similar to Theorem 3.3 can be proven for OMP. 



Theorem 5.1 ( |64L [20J) Let A be an m x n matrix, and let x £ W 1 \ {0} be a solution of 
(P ) satisfying 

\\x\\ <u i +K A y 1 )- 

Then OMP with error threshold e = recovers x. 

Other prominent examples of greedy algorithms are Stagewise OMP (StOMP) [28], 
Regularized OMP (ROMP) [56J, and Compressive Sampling MP (CoSaMP) [SSJ. For a 
survey of these methods, we wish to refer to |32^ Chapter 8]. 

An intriguing, very recently developed class of algorithms is Orthogonal Matching 
Pursuit with Replacement (OMPR) [40], which not only includes most iterative (hard)- 
thresholding algorithms as special cases, but this approach also permits the tightest known 
analysis in terms of RIP conditions. By extending OMPR using locality sensitive hashing 
(OMPR-Hash), this also leads to the first provably sub-linear algorithm for sparse recovery, 
see |40j . Another recent development is message passing algorithms for compressed sensing 
pioneered in [23] ; a survey on those can be found in (32J, Chapter 9] . 
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Input: 






• Matrix A = (oj)f =1 G R mxn and vector x 


el™. 




• Error threshold e. 






Algorithm: 






1) Set k = 0. 






2) Set the initial solution x° = 0. 






3) Set the initial residual r° = y — Ax° = y 






4) Set the initial support S° = suppx = 0. 






5) Repeat 






6) Set k = k + 1. 






7) Choose io such that min c \\cai — 


r 2 < 


min c caj — r fc 1 2 


for all i. 






8) Set S k = S*- 1 U {i }. 






9) Compute x k = argmin^ \\Ax — y 2 


subject to 


suppx = 5 fc . 


10) Compute r k = y — Ax k . 






11) until \\r k 2 < £• 






Output: 






• Approximate solution x k . 







Figure 4: Orthogonal Matching Pursuit (OMP): Approximation of the solution of (Po) 

5.3 Combinatorial Algorithms 

These methods apply group testing to highly structured samples of the original signal, 
but are far less used in compressed sensing as opposed to convex optimization and greedy 
algorithms. From the various types of algorithms, we mention the HHS pursuit [36] and a 
sub- linear Fourier transform [39j. 

6 Applications 

We now turn to some applications of compressed sensing. Two of those we will discuss in 
more detail, namely data separation and recovery of missing data. 

6.1 Data Separation 

The data separation problem can be stated in the following way. Let x = x\ + X2 G R n . 
Assuming we are just given x, how can we extract x\ and X2 from it? At first glance, this 
seems to be impossible, since there are two unknowns for one datum. 
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6.1.1 An Orthonormal Basis Approach 



The first approach to apply compressed sensing techniques consists in choosing appropriate 
orthonormal bases 3>i and <3?2 for W 1 such that the coefficient vectors <&Jxi (i = 1,2) are 
sparse. This leads to the following underdetermined linear system of equations: 



(■■2 



x = [$1 I $ 2 ] 

Compressed sensing now suggests to solve 

subject to x = [ $1 I $ 2 ] 



mm 

Cl,C2 



C-2 



(>2 



(3) 



If the sparse vector [^xi, ^a^] 7 " can be recovered, the data separation problem can be 
solved by computing 

X\ = $i($fxi) and x 2 = ^(^I^)- 

Obviously, separation can only be achieved provided that the components x\ and x<i are 
in some sense morphologically distinct. Notice that this property is indeed encoded in the 
problem if one requires incoherence of the matrix [ <J>i | <3?2 ]■ 

In fact, this type of problem can be regarded as the birth of compressed sensing, since the 
fundamental paper [21] by Donoho and Huo analyzed a particular data separation problem, 
namely the separation of sinusoids and spikes. In this setting, x\ consists of n samples of a 
continuum domain signal which is a superposition of sinusoids: 



n-1 



Xl 



fn ^ 



ci,u>e 



2-Kiuit/n 



^=0 / 0<t<n-l 

Letting <3?i be the Fourier basis, the coefficient vector 



&1X! 



c\, where $1 = [(fifl 



<pi,n-i } with tpx 



I 1 2-Kiuit/n 



0<t<n-l 



is sparse. The vector X2 consists of n samples of a continuum domain signal which is a 
superposition of spikes, i.e., has few non-zero coefficients. Thus, letting <J>2 denote the 
Dirac basis (standard basis), the coefficient vector 

$ 2 #2 = £2 = c 2 

is also sparse. Since the mutual coherence of the matrix [ 3>i | $2 ] can be computed to be 



/=, Theorem 3.3 implies the following result. 



Theorem 6.1 ([21, 30J) Let x±,X2 and $i,$2 be defined as in the previous paragraph, 
and assume that [|$jxi||o + H^^Ho < 2(1 + V™)- Then 



\T„ 



§{x 1 



$0X2 



argmm 



Cl,C 2 



Cl 
C2 



subject to x = [ $1 I $2 ] 



ci 

(■2 
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6.1.2 A Frame Approach 

Now assume that we cannot find sparsifying orthonormal bases but Parseval frameqj $1 
and <3?2 _ notice that this situation is much more likely due to the advantageous redundancy 
of a frame. In this case, the minimization problem we stated in ([3]) faces the following 
problem: We are merely interested in the separation x = x% + xi- However, for each such 
separation, due to the redundancy of the frames the minimization problem searches through 
infinitely many coefficients [ci, c 2 ] satisfying x% = $iQ, i = 1, 2. Thus it computes not only 
much more than necessary - in fact, it even computes the sparsest coefficient sequence of x 
with respect to the dictionary [ $1 | $2 ] ~ but this also causes numerical instabilities if the 
redundancy of the frames is too high. 

To avoid this problem, we place the l\ norm on the analysis, rather than the synthesis 



side as already mentioned in Subsection 1.5. Utilizing the fact that <3?i and <3?2 are Parseval 



frames, i.e., that $i$J = I (i = 1, 2), we can write 

x = xi + x 2 = $l($f Xl) + ® 2 ($2 X 2)- 

This particular choice of coefficients - which are in frame theory language termed analysis 
coefficients - leads to the minimization problem 

min ||$ 1 xi||i + ||$ 2 X2II1 subject to x = x\ + X2- (4) 

X\,X2 

Interestingly, the associated recovery results employ structured sparsity, wherefore we will 



also briefly present those. First, the notion of relative sparsity (cf. Definition 2.3) is adapted 
to this situation. 

Definition 6.1 Let $1 and $2 be Parseval frames for M n with indexing sets {1, . . . , Ai} 
and {1, . . . , A^}, respectively, let Aj C {1, . . . , Aj}, i = 1,2, and let 5 > 0. Then the vectors 
xi and X2 are called <5-relatively sparse in <i>i and &2 with respect to Ai and A2 ; if 

||lAf*i'a:i||i + ||lAS^2»2||i < S. 
Second, the notion of mutual coherence is adapted to structured sparsity as already 



discussed in Subsection 3.2.1| This leads to the following definition of cluster coherence. 



Definition 6.2 Let $1 = (9211)1=1 and $2 = ( c /'2j) ? Ji be Parseval frames for W 1 , respec- 
tively, and let A\ C {1, . . . , A r i}. Then the cluster coherence /x c (Ai, $1; $2) of $1 and $2 
with respect to Ai is defined by 

// c (Ai,$i;$ 2 ) = . max V \(<pu,<P2j)\. 

The performance of the minimization problem Q can then be analyzed as follows. It 
should be emphasized that the clusters of significant coefficients Ai and A2 are a mere 
analysis tool; the algorithm does not take those into account. Further, notice that the 
choice of those sets is highly delicate in its impact on the separation estimate. For the proof 
of the result, we refer to |22j. 

T _ r 



Recall that $ is a Parseval frame, if "Jxl? 
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Theorem 6.2 ([22]) Let x = x\ + 22 € M n , let <J>i and $2 be Parseval frames for M. n with 
indexing sets {1, . . . , Ni} and {1, . . . , N2}, respectively, and let A» C {1, . . . , Ni}, i = 1,2. 
Further, suppose that x\ and X2 are 5-relatively sparse in <3?i and $2 w#J respect to Ai and 
A2, and Zet [x*,X2] T &e a solution of the minimization problem Q. T/ien 



a?l||2 + \\x% - X2II2 < 



where [i c = max{/i c (Ai, $1; $ 2 ), Ai c (A 2 , $2; $i)}- 



2£ 
-2// c ' 



Let us finally mention that data separation via compressed sensing has been applied, for 
instance, in imaging sciences for the separation of point- and curvelike objects, a problem 
appearing in several areas such as in astronomical imaging when separating stars from 
filaments and in neurobiological imaging when separating spines from dendrites. Figure [5] 
illustrates a numerical result from [15] using wavelets (see [SD] ) and shearlets (see [33 HH] ) as 
sparsifying frames. A theoretical foundation for separation of point- and curvelike objects by 
£1 minimization is developed in |22j . When considering thresholding as separation method 
for such features, even stronger theoretical results could be proven in [45J. Moreover, a first 
analysis of separation of cartoon and texture - very commonly present in natural images - 
was performed in 




Figure 5: Separation of a neurobiological image using wavelets and shearlets 



For more details on data separation using compressed sensing techniques, we refer to 
Chapter 11]. 



6.2 Recovery of Missing Data 

The problem of recovery of missing data can be formulated as follows. Let x = xk + xm £ 
W ® W -1 ", where W is a subspace of W 1 . We assume only xk is known to us, and we aim 
to recover x. Again, this seems unfeasible unless we have additional information. 
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6.2.1 An Orthonormal Basis Approach 

We now assume that - although x is not known to us - we at least know that it is sparsified 
by an orthonormal basis <!?, say. Letting Pyy and Py^± denote the orthogonal projections 
onto W and W ± , respectively, we are led to solve the underdetermined problem 

P w $c = P w x 

for the sparse solution c. As in the case of data separation, from a compressed sensing 
viewpoint it is suggestive to solve 

min ||c||i subject to Pyv&c = P-yyx. (5) 

c 

The original vector x can then be recovered via x = &c. The solution of the inpainting 
problem - a terminology used for recovery of missing data in imaging science - was first 
considered in |31|. 



Application of Theorem 3.3 provides a sufficient condition for missing data recovery to 
succeed. 

Theorem 6.3 ([19J) Let x £ M. n , let W be a subspace ofM. n , and let <3> be an orthonormal 
basis for R n . If\\<5> T x\\ < ±(1 + n{Pvs>§Y l ), then 

<3> x = argmin c ||c||i subject to Pw&c = Pyyx. 

6.2.2 A Frame Approach 

As before, we now assume that the sparsifying system $ is a redundant Parseval frame. 
The adapted version to ([5]), which places the t\ norm on the analysis side, reads 

min ||3> x||i subject to Pyyx = Pyyx. (6) 

x 

Employing relative sparsity and cluster coherence, an error analysis can be derived in a 
similar way as before. For the proof, the reader might want to consult |42j . 

Theorem 6.4 (|42j) Let x G W 1 , let & be a Parseval frame for W 1 with indexing set 
{1, . . . , A^}, and let A C {1, . . . , N}. Further, suppose that x is 5 -relatively sparse in $ with 
respect to A, and let x* be a solution of the minimization problem (16]). Then 

n * II ^ 26 

1 - 2/x c 

where ,u c = fi c (A, P w ±&; *). 

6.3 Further Applications 

Other applications of compressed sensing include coding and information theory, machine 
learning, hyperspectral imaging, geophysical data analysis, computational biology, remote 
sensing, radar analysis, robotics and control, A/D conversion, and many more. Since an 
elaborate discussion of all those topics would go beyond the scope of this survey paper, we 
refer the interested reader to dsp. rice.edu/cs, 
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